CLIR-Based Collaborative Construction of a Multilingual Terminological Dictionary for Cultural Resources

نویسنده

  • Mohammad DAOUD
چکیده

We will describe ongoing work in developing a collaborative environment to construct a CLIRbased multilingual terminological dictionary dedicated to the Digital Silk Road project and web site launched and managed by NII (National institute of Informatics, Japan-Tokyo). A considerable amount of cultural resources has been digitized, including 95 rare books written in 10 different languages. In order to make them searchable and accessible easily by the visitors of the site, themselves multilingual as well, a cross lingual information retrieval system is being built. As these books are very rich in specialized terms, an important part of that endeavour is to gather these terms in many languages in a terminologicial dictionary (a database of terms contianing some information potentially usable to later build a real terminological database). For that purpose, we use a participative approach, where visitors of the online archive are the main source of the terms used in the languages they know, while multilingual online resources are used to initialize the term base through a process that depends on the archived textual data. 1 The first, the third, and the fourth authors work at Grenoble Informatics Laboratory, GETALP, Université Joseph Fourier (Grenoble, France). 2 The second author works for the National Institute of Informatics (Tokyo, Japan).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Cross-language Information Retrieval via Disambiguation and Vocabulary Discovery

Cross-lingual information retrieval (CLIR) allows people to find documents irrespective of the language used in the query or document. This thesis is concerned with the development of techniques to improve the effectiveness of Chinese–English CLIR. In Chinese–English CLIR, the accuracy of dictionary-based query translation is limited by two major factors: translation ambiguity and the presence ...

متن کامل

Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries

This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...

متن کامل

Dbnary: Wiktionary as a LMF based Multilingual RDF network

Contributive resources, such as wikipedia, have proved to be valuable in Natural Language Processing or Multilingual Information Retrieval applications. This article focusses on Wiktionary, the dictionary part of the collaborative resources sponsored by the Wikimedia

متن کامل

Building Specialized Multilingual Lexical Graphs Using Community Resources

We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users’ behaviors to extract interesting patterns and facts (implicit approach). As a ...

متن کامل

Dictionary-based CLIR for the CLEF Multilingual Track

This report describes the work done for our participation in the multilingual track of the CrossLanguage Evaluation Forum (CLEF). We use a dictionary-based approach to translate English queries into German, French and Italian queries. We then apply a term disambiguation technique to select the best translation terms from the terms found in the dictionary entries, and a query expansion technique...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010